feat(kas): enable topology aware routing on kube-apiserver ClusterIP service#8732
feat(kas): enable topology aware routing on kube-apiserver ClusterIP service#8732rrp-bot wants to merge 3 commits into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds topology-aware routing to the Kubernetes API server (KAS) services by setting the 🚥 Pre-merge checks | ✅ 11✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @rrp-bot. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rrp-bot The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
control-plane-operator/controllers/hostedcontrolplane/kas/service.go (1)
109-112:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRoute strategy is missing the topology-aware routing annotation.
The Route strategy creates a ClusterIP service (line 111), but unlike the LoadBalancer+private case (lines 98-102), it does not set the
service.kubernetes.io/topology-modeannotation. This causesTestReconcileServiceTopologyAwareRoutingto fail for the "When Route strategy" test case (service_test.go lines 138-146), which expects the annotation to be present.🔧 Proposed fix to add topology annotation for Route strategy
case hyperv1.Route: if hcp.Spec.Platform.Type != hyperv1.IBMCloudPlatform || svc.Spec.Type != corev1.ServiceTypeNodePort { svc.Spec.Type = corev1.ServiceTypeClusterIP + // Enable topology aware routing so that callers (KCM, scheduler, operators, etc.) + // are routed to a KAS pod in their own AZ, avoiding cross-AZ data transfer charges. + // KAS pods are spread across zones via requiredDuringScheduling anti-affinity, + // satisfying the >=1 endpoint per zone precondition for TAR to activate. + svc.Annotations["service.kubernetes.io/topology-mode"] = "Auto" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@control-plane-operator/controllers/hostedcontrolplane/kas/service.go` around lines 109 - 112, In the hyperv1.Route branch (case hyperv1.Route) where you set svc.Spec.Type = corev1.ServiceTypeClusterIP, also add the topology-aware routing annotation svc.ObjectMeta.Annotations["service.kubernetes.io/topology-mode"] = "TopologyAndEndpoints" (same as the LoadBalancer+private branch) so the Route strategy produces the topology-aware routing annotation expected by TestReconcileServiceTopologyAwareRouting; ensure you reference svc and hcp.Spec.Platform.Type in the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go`:
- Around line 138-146: The test fails because ReconcileService in service.go
does not set the "service.alpha.openshift.io/topology-mode" annotation for the
Route publishing strategy; update the Route branch in ReconcileService (the
logic handling hyperv1.ServicePublishingStrategy{Type: hyperv1.Route}) to add
the same topology-mode annotation used for the private LoadBalancer case (set to
"internal" or the value used elsewhere) on the ClusterIP service object so the
expectTARAnnotation check in the test passes.
---
Outside diff comments:
In `@control-plane-operator/controllers/hostedcontrolplane/kas/service.go`:
- Around line 109-112: In the hyperv1.Route branch (case hyperv1.Route) where
you set svc.Spec.Type = corev1.ServiceTypeClusterIP, also add the topology-aware
routing annotation
svc.ObjectMeta.Annotations["service.kubernetes.io/topology-mode"] =
"TopologyAndEndpoints" (same as the LoadBalancer+private branch) so the Route
strategy produces the topology-aware routing annotation expected by
TestReconcileServiceTopologyAwareRouting; ensure you reference svc and
hcp.Spec.Platform.Type in the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: c64468be-1fe1-4e2d-bf83-32eabc0a4706
📒 Files selected for processing (2)
control-plane-operator/controllers/hostedcontrolplane/kas/service.gocontrol-plane-operator/controllers/hostedcontrolplane/kas/service_test.go
| name: "When Route strategy it should set topology-mode annotation on ClusterIP service", | ||
| hcp: &hyperv1.HostedControlPlane{ | ||
| Spec: hyperv1.HostedControlPlaneSpec{ | ||
| Platform: hyperv1.PlatformSpec{Type: hyperv1.AWSPlatform}, | ||
| }, | ||
| }, | ||
| strategy: hyperv1.ServicePublishingStrategy{Type: hyperv1.Route}, | ||
| expectTARAnnotation: true, | ||
| }, |
There was a problem hiding this comment.
This test case will fail due to incomplete implementation.
The test expects ReconcileService to set the topology-mode annotation for Route strategy, but the implementation in service.go does not add the annotation for the Route case (only for LoadBalancer+private). This test case will fail until the Route case is updated to add the annotation.
See the comment on service.go lines 109-112 for the required fix.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go`
around lines 138 - 146, The test fails because ReconcileService in service.go
does not set the "service.alpha.openshift.io/topology-mode" annotation for the
Route publishing strategy; update the Route branch in ReconcileService (the
logic handling hyperv1.ServicePublishingStrategy{Type: hyperv1.Route}) to add
the same topology-mode annotation used for the private LoadBalancer case (set to
"internal" or the value used elsewhere) on the ClusterIP service object so the
expectTARAnnotation check in the test passes.
Summary
Set
service.kubernetes.io/topology-mode=Autoon thekube-apiserverClusterIP service so that callers — KCM, scheduler, OAPI, oauth-apiserver, OLM, konnectivity-server, and all operators — are routed to a KAS pod in their own Availability Zone, avoiding cross-AZ data transfer charges.Background
AWS charges $0.01/GB per direction for traffic crossing AZ boundaries ($0.02/GB round trip). In a HyperShift management cluster running many hosted control planes, every component calls
kube-apiserverconstantly via the ClusterIP service. Without Topology Aware Routing (TAR), OVN distributes these calls across all KAS pods regardless of AZ — paying the cross-AZ tax on a large fraction of requests.How TAR works
TAR is a native Kubernetes feature (stable since 1.27, GA in 1.33) that programs OVN/kube-proxy to prefer zone-local endpoints. The annotation goes on the Service being called — all callers benefit automatically with no changes on their side. There is zero per-request overhead.
Why this is safe
requiredDuringSchedulingpod anti-affinity ontopology.kubernetes.io/zone, guaranteeing ≥1 endpoint per zone in HA mode.Changes
kas/service.go: Addservice.kubernetes.io/topology-mode=Autoto the ClusterIP path inReconcileServiceand toReconcileServiceClusterIP(Azure/KubeVirt path)kas/service_test.go: AddTestReconcileServiceTopologyAwareRoutingandTestReconcileServiceClusterIPTopologyAwareRoutingVerification
After deployment, confirm TAR is active:
Zone names in the hints field confirms TAR is working.
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests